KUNLP System for NTCIR-4 Korean-English Cross-Language Information Retrieval
نویسندگان
چکیده
This paper describes our Korean-English crosslanguage information retrieval system for NTCIR-4. Our system is based on a query translation approach with a bilingual dictionary and co-occurrence information between English terms in English corpus. In this year, we have focused on translation of unknown words. We have expanded the existing bilingual dictionary by gathering some of the Korean-English translation pairs for Korean words from Web manually. For other unknown not contained in the expanded bilingual dictionary, we automatically transliterated into English using pre-constructed mapping table. Some issues for processing Korean queries and documents are also described, such as identification of Korean phrases. On evaluation collections for NTCIR-4, performance of our system is 30.25% for description query type, 33.33% for title query type, and 32.47% for combination query type of description and narrative in relax scoring. Post-submission experiments show that our expanded dictionary and transliteration mechanism improve the performance of our system.
منابع مشابه
KUNLP System for NTCIR-3 English-Korean Cross-Language Information Retrieval
This paper describes KUNLP system for the English-Korean cross-language information retrieval track in NTCIR-3 workshop and some experiments after the workshop. Query translation method based on the bilingual dictionary and the document language corpus was used. To automatically transliterate some proper nouns such as Korean person names, Korean place names, and Korean company names, we have co...
متن کاملNTCIR-4 Chinese, English, Korean Cross Language Retrieval Experiments Using PIRCS
In NTCIR-4 we participated in Korean, Chinese, English monolingual, Chinese-English, EnglishKorean bilingual, and Chinese-Korean cross language (using English as pivot) retrieval tasks based on our PIRCS retrieval system. The query translation approach was employed for CLIR. We combined two MT translations for Chinese-English, and two for English-Korean. For the latter, a webbased entity-orient...
متن کاملCross-Language IR at University of Tsukuba: Automatic Transliteration for Japanese, English, and Korean
This paper describes our cross-language information retrieval system for the NTCIR-4 CLIR task. Our system, which follows the query translation approach, uses a compound word translation and transliteration. Transliteration is effective if a query includes foreign words, such as technical terms and proper nouns, spelled out by phonetic alphabets. We apply our method, which was originally propos...
متن کاملThomson Legal and Regulatory at NTCIR-4: Monolingual and Pivot-Language Retrieval Experiments
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-4 workshop. We submitted formal runs for monolingual retrieval in Japanese, Chinese and Korean. Our bilingual runs from Chinese and Korean to Japanese rely on English as a pivot language. During our monolingual experiments, we compared building stopword lists using query logs to building stopword lists from collection stati...
متن کاملAINLP at NTCIR-6: Evaluations for Multilingual and Cross-Lingual Information Retrieval
In this paper, a multilingual cross-lingual information retrieval (CLIR) system is presented and evaluated in NTCIR-6 project. We use the language-independent indexing technology to process the text collections of Chinese, Japanese, Korean, and English languages. Different machine translation systems are used to translate the queries for bilingual and multilingual CLIR. The experimental results...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004